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Abstract — With the advent of smartphone technology, it has be- 
come possible to conceive of entirely new classes of applications. 
Social swarming, in which users armed with smartphones are 
directed by a central director to report on events in the physical 
world, has several real- world applications: search and rescue, 
coordinated fire-fighting, and the DARPA balloon hunt challenge. 
In this paper, we focus on the following problem: how does the 
director optimize the selection of reporters to deliver credible 
corroborating information about an event. We first propose a 
model, based on common intuitions of believability, about the 
credibility of information. We then cast the problem posed 
above as a discrete optimization problem, and introduce optimal 
centralized solutions and an approximate solution amenable 
to decentralized implementation whose performance is about 
20% off on average from the optimal (on real- world datasets 
derived from Google News) while being 3 orders of magnitude 
more computationally efficient. More interesting, a time-averaged 
version of the problem is amenable to a novel stochastic utility 
optimization formulation, and can be solved optimally, while in 
some cases yielding decentralized solutions. To our knowledge, 
we are the first to propose and explore the problem of extracting 
credible information from a network of smartphones. 

I. Introduction 

With the advent of smartphone technology, it has become 
possible to conceive of entirely new classes of applications. 
Recent research has considered personal reflection (T), social 
sensing O, lifestyle and activity detection |3|, and advanced 
speech and image processing applications [4|. These applica- 
tions are enabled by the programmability of smartphones, their 
considerable computing power, and the presence of a variety 
of sensors on-board. 

In this paper, we consider a complementary class of po- 
tential applications, enabled by the same capabilities, that we 
call social swarming. In this class of applications, a swarm 
of users, each armed with a smart phone, cooperatively and 
collaboratively engages in one or more tasks. These users often 
receive instructions from or send reports (a video clip, an audio 
report, a text message, or etc.) to a swarm director. Because di- 
rectors have a global view of information from different users, 
directors are able to manage the swarm efficiently to achieve 
the task's objectives. Beyond the obvious military applications, 
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there are several civilian ones: search and rescue, coordinated 
fire-fighting, and the DARPA balloon hunt challenge 1 . 

In these applications, an important challenge is to obtain 
credible (or believable) information. In general, sociologists 
have observed three ways in which believable information 
might be obtained (5): homophily, by which people believe 
like-minded people; test-and-validate, by which the recipient 
of information tests the correctness of the information; and 
corroboration, where the belief in information is reinforced 
by several sources reporting the same (or similar) information. 
The process by which humans believe information is exceed- 
ingly complex, and an extended discussion is beyond the scope 
of this paper. 

Instead, our focus is on simple and tractable models for 
corroboration in social swarming type applications. Specif- 
ically, the scenario we consider is the following. Suppose 
that an event (say, a balloon sighting) is reported to a swarm 
director. The director would like to corroborate this report by 
obtaining reports from other swarm members: which reporters 
should she select? We call this the corroboration pull problem. 
Clearly, asking every swarm member to report is unnecessary, 
at best: swarms can have several hundred participants, and a 
video report from each of them can overwhelm the network. 
Thus, intuitively, the director would like to selectively request 
reports from a subset of swarm members, while managing the 
network resources utilized. 

In this paper, we formalize this intuition and study the 
space of corroboration pull formulations. Our contributions 
are three-fold. 1) We introduce a model for the credibility of 
reports. This model quantifies common intuitions about the 
believability of information: for example, that video is more 
believable than text, and that a reporter closer to an event 
is more believable than one further away (Section [Hl>- 2) We 
then cast the one-shot corroboration pull problem as a discrete 
optimization problem and show that it reduces to a multiple- 
choice knapsack problem with weakly-polynomial optimal 
solutions. We develop strongly-polynomial, but inefficient, 
solutions for the case when the number of formats is fixed, 
and an optimal algorithm for the case of two formats. Finally, 
we derive an approximation algorithm for the general case 
that leverages the structure of our credibility model. This 
algorithm is about 20% off the optimal, but its running time is 
2-3 orders of magnitude faster than the optimal algorithm, a 
running time difference that can make the difference between 
winning and losing in, say, a balloon hunt. 3) We then show 
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that, interestingly, the renewals version of the problem, where 
the goal is to optimize corroboration pull in a time-averaged 
sense, can be solved optimally, while admitting a completely 
decentralized solution. 

II. Terminology, Model, and Optimization Formulation 

As smart phones proliferate, social swarming applications 
are likely to become increasingly common. In this paper, we 
consider a constrained form of a social swarming application 
in which N participants, whom we call reporters, collabora- 
tively engage in a well-defined task. Each reporter is equipped 
with a smart phone and directly reports to a swarm director 
using the 3G/EDGE network. A reporter may either be a 
human being or a sensor (static, such as a fixed camera, or 
mobile, as a robot). A director (either a human being, or 
analytic software) assimilates these reports, and may perform 
some actions based on the content of these combined reports. 

Our setting is simplified in many ways. For now, we con- 
sider a situation where reporters cooperate, and are therefore 
benign: we leave a consideration of malicious reporting to 
future work. Similarly, we have implicitly assumed an always 
available 3G/EDGE network, and have not considered net- 
work dynamics (such as the availability of opportunistic WiFi 
networks). We believe this assumption can be relaxed using 
techniques from our prior work J6), but have left an exploration 
of this to future work. Despite these simplifications, we show 
that the problem space has sufficient richness in and of itself. 

Each reporter reports on an event. The nature of the event 
depends upon the social swarming application: for example, 
in a search and rescue operation, an event corresponds to 
the sighting of an individual who needs to be rescued; in 
the balloon hunt, an event is the sighting of a balloon. 
Events occur at a particular location, and multiple events may 
occur concurrently either at the same location or at different 
locations. 

Reporters can transmit reports of an event using one of 
several formats: such as a video clip, an audio clip, or a 
text message describing what the report sees. Each report is a 
form of evidence for the existence of the event. As we discuss 
below, different forms of evidence are "believed" to different 
extents. In general, we assume that each reporter is capable 
of generating R different report formats, denoted by fj, for 
1 < j < R. However, different formats have different costs 
to the network: for example, video or audio could consume 
significantly higher transmission resources than, say, text. We 
denote by ej the cost of a report fy. for ease of exposition, 
we assume that reports are a fixed size so that all reports of 
a certain format have the same cost (our results can be easily 
generalized to the case where report costs are proportional to 
their length). 

Finally, reporters can be mobile, but we assume that the 
director is aware of the location of each reporter. In our 
problem formulation, we ignore the cost of sending periodic 
location updates to the director. In practice, this may be 
a reasonable assumption for three reasons. First, the cost 
of location updates may be amortized over other context 
aware applications that may be executing on the smart phone. 



Second, although this cost may be significant, it adds a fixed 
cost to our formulations and does not affect the results we 
present in the paper. Finally, the absolute cost of the location 
updates themselves is significantly less than the cost of video 
transmissions, for example. 

Now, suppose that the director in a swarming application has 
heard, through out of band channels or from a single reporter, 
of the existence of an event E at location L. To verify this 
report, the director would like to request corroborating reports 
from other reporters in the vicinity of L. Which reporters 
should she get corroborating reports from? What formats 
should those reporters use? 

To understand this, recall that the goal of corroboration is 
to increase the director's belief in the occurrence of the event. 
How much should the director believe a specific reporter? Or, 
equivalently, what is the credibility of a report? 

In general, this is a complex sociological and psychological 
question which, at the moment, is not objectively quantifiable. 
However, in this paper, we model the credibility of the 
report using two common intuitions about credibility. The first 
intuition is based on the maxim "seeing is believing": a video 
report is more credible than a text report. We extend this 
maxim in our model to incorporate other formats, like audio: 
audio is generally less credible than video (because, while 
it gives some context about an event, video contains more 
context), but more credible than text (for a similar reason). 
Of course, this is an assumption: video and audio can be just 
as easily doctored as text. Recall that our model, for now, 
assumes cooperative non-malicious elements: in future work, 
we plan to discuss how to model the credibility of reports 
in the presence of malicious elements. Moreover, as we shall 
show later, many of our results are insensitive to the exact 
choice of the credibility model. 

Our second intuition is based on the often heard statement 
"I'll believe someone who was there", suggesting that proxim- 
ity of the reporter to an event increases the credibility of the 
report. More precisely, a report A generated by a reporter at 
distance d a from an event has a higher credibility than a report 
B generated by a reporter at a distance db, if d a < db- This is 
also a simplified model: the real world is more complex, since 
the complexity of the terrain, or line of sight, may matter more 
than geometric distance. 

While are many different ways in which we can objectively 
quantify the credibility of a report given these intuitions, we 
picked the following formulation. Let S,- be the position of 
reporter i, L be the position of event E and c,-j(5,-, L) be the 
credibility of the report generated by reporter i when report 
format fj is used. We define Cij(Si,L) as: 



Cij(Si,L) 



jjld{Si,L) s i 



if h < d(Si,L) 
if d(S b L) < h 



(1) 



with 1 < j < R, j\ < j2 < ■ ■ ■ < Jr, and 5\ > 62 > ■ • ■ > Sr. 
Here, d{.) is the Euclidean distance between points, ho is a 
certain minimum distance to avoid division by zero as well as 
to bound the maximum credibility to a certain level, jj is a 
constant of proportionality implying the maximum achievable 
credibility of report format fj, and the credibility decays 
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according to a power-law with exponent 5 j when format fj 
is used. 

Our credibility model incorporates the two intuitions de- 
scribed above as follows. The intuition about the credibility 
being dependent on proximity is captured by the power-law 
decay with distance. The intuition about the higher credibility 
of the video compared to text is captured by having a larger 
y and a smaller exponent for video. 

This model can be extended to incorporate noise or con- 
fusion. For example, poor visibility or audible noise near a 
reporter may, depending upon the format used, reduce the 
believability of a report. The intensity of point sources of noise 
can be modeled as a function that decays with distance: 



TABLE I 
Notation 



1 



[1 + d(S i ,O l )] 1 ^ 



(2) 



where 5, is the position of reporter i, 0\ is the position of noise 
source 1, and cr\ represents the strength and effective range of 
noise source 1 . Then, if for reporter i and event E, the original 
credibility without noise is cy(5,,L), then the credibility with 
X noise sources should be 



c[j(Si, L) = Ci j(Si, L) Yf p=l (l - G p (S h O p j) 



(3) 



Noise sources effectively increase the distance of the reporter 
from the event, reducing his or her credibility. As we show 
later, our solutions can incorporate this form of noise without 
any modification. 

Although we have assigned objective quantitative values to 
credibility, belief or disbelief is often qualitative and subjec- 
tive. Thus, we don't expect swarm directors to make decisions 
based on the exact values of credibility of different reports, but 
rather to operate in one of two modes: a) ask the network to 
deliver corroborating reports whose total credibility is above a 
certain threshold, while minimizing cost, or b) obtain as much 
corroborating information that they can get from the network 
for a given cost. We study these two formulations, respectively 
called MinCost and MaxCred. 

Before doing so, there are two questions to be answered: 
What is the value of the credibility of a collection of cor- 
roborating reports? What is the physical/intuitive meaning of 
a threshold on the credibility? For the first question, there 
are many possible answers and we consider two. With an 
additive corroboration function, the total credibility is simply 
the sum of the individual credibilities. More generally, with 
a monotonically-increasing corroboration function, the total 
credibility increases monotonically as a function of the sum 
of the individual credibilities. The second question is important 
because it can help directors set thresholds appropriately. The 
intuition for a particular threshold value C can be explained 
as follows. Suppose a director would be subjectively satisfied 
with 3 corroborating video clips from someone within 10m of 
an event. One could translate this subjective specification into 
a threshold value by simply taking the sum of the credibilities 
of 3 video reports from a distance of 10m. 

In the next two sections, we formally define MinCost and 
MaxCred, and then consider two problem variants: a one- 
shot problem which seeks to optimize reporting for individual 



N 


the total number of available reporters 


c ti 


the short form of (1) in a given event 


R 


the total number of report formats 


e j 


the cost when using report format fj 


C 


the target credibility in MinCost 


A 


the dynamic programming process of MinCost 


B 


the cost budget in MaxCred 


D 


the dynamic programming process of MaxCred 



events, and a renewals problem which optimizes reporting over 
a sequence of event arrivals. 

III. The One-Shot Problem 

In this section, we formally state the MinCost and MaxCred 
formulations for the additive corroboration function and in the 
absence of noise, discuss their complexity, develop optimal 
solutions for them, and then explore an approximation algo- 
rithm that leverages the structure of the credibility function 
for efficiency. We conclude with a discussion of extensions to 
the formulations for incorporating the impact of noise sources, 
and for monotonically-increasing corroboration function. Our 
exposition follows the notation developed in the previous 
section, and summarized in Table U 

A. MinCost and MaxCred: Problem Formulation and Com- 
plexity 

1 ) Problem Formulations: Recall that, in Section [II] we 
informally defined the MinCost problem to be: what is the 
minimum cost that guarantees total credibility C > 0? Min- 
Cost can be stated formally as an optimization problem: 



(4) 



N R 

Minimize : ^ ^ x i,j e j 

i=l y=l 
N R 

Subject to: ^ ^ x u c i,j ^ C 

i=l 7=1 

XiJ 6 {0, l},Vie {l,...,N},\/j e {l,...,R} 

R 

Z*U^ l,Vie{l, ...,#} 

7=1 

where x,j is a binary variable that is 1 if reporter ; uses format 
fj, and otherwise. 

Analogously, we can formulate MaxCred (the maximum 
credibility that can be achieved for a cost budget of B > 0) as 
the following optimization problem: 



Maximize 



N R 

ZZ*^ 

'=1 7=1 



(5) 



N R 

Subject to: ^ ^ x Uj e j ^ B 
;=i 7=1 



x Uj e {0,1}, Vie {1, ...,tV}, Vy e {1, ...,R] 

R 

YjXij ^ l,Vie{l,...,/V} 

7=1 
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2) On the Complexity of MinCost and MaxCred: If, in the 
above formulation, the cost ej is also dependent on the identity 
of the reporter (and therefore denoted by ey), the MaxCred 
problem can be shown to be a special instance of the Multiple- 
Choice Knapsack Problem (MCKP, J7|). Moreover, the special 
case of one format (and ey = e,) is the well-known Knapsack 
problem (KP) which is NP-hard. However, when the cost is 
dependent only on the format (i.e., ey = ej), we can state the 
following theorem, whose proof (omitted for brevity) uses a 
reduction from the original Knapsack problem. 

Theorem 3.1: MinCost and MaxCred are NP-Hard. 

B. Optimal Solutions 

Despite Theorem 13.11 it is instructive to consider optimal 
solutions for the two problems for two reasons. First, for many 
social swarming problem instances, the problem sizes may 
be small enough that optimal solutions might apply. Second, 
optimal solutions can be used to calibrate an approximation 
algorithm that we discuss later. In this section, we discuss two 
classes of optimal solutions for MinCost and MaxCred, with 
different tradeoffs: one based on dynamic programming, and 
another based on a min-cost flow formulation. 

1) Dynamic Programming: Since there exist optimal, 
weakly-polynomial algorithms for MCKP, it is natural that 
similar algorithms exist for MinCost and MaxCred. We de- 
scribe these algorithms for completeness, since we use them 
in a later evaluation. 

For MinCost (@), we can write yy = 1 - xy, where yy e 
{0, 1 }, and then we have: 

N R N R 

J] Yj e J- Maximize (6) 

i=l 7=1 i=l j=l 

N R N R 

Subject to: £ Z y ^ ~ Z Z c « " C = W 

i=l y=l i=l j=l 

y u 6{0,l},Vie{l,...,#} ,Vje{l,....R) 

R 

i,v»e{i,...,iv} 

7=1 

where the minimization problem (4) has been transformed into 
a maximization problem, and the notation in (6) emphasizes 
that the first term in the total cost Yjf=i 2y=i e j does not depend 
on the yy variables to be optimized. For a given event, the 
sum of the ey values is a constant, and so W is also a constant. 

This optimization problem can be solved by a dynamic 
programming approach if we assume all cys are truncated 
to a certain decimal precision, so that cy e {0, (, 2£, . . .} 
where g is a discretization unit. Then for any binary yy 
values that meet the constraints of the above problem, the sum 
ZSiZ^iJyCy takes values in a set W4{0, £ 2£, . . . , W}. 
Note that the cardinality \W\ depends on N, R, the cy values, 
and the discretization unit f . Now define A(l, s) as the sub- 
problem of selecting reporters in the set {1, . . . , 1} subject to a 
constraint s. Assuming A(l, s) values are known for a particular 

we recursively compute A(l + 1, s) for all s e "W by: 

A(l+l,s) = max[0 (O, (/, s), cf> a \l, s),..., 4> {R \l, s)] (7) 



where (A) (Z, s) is defined for k e {0, 1, . . . ,R}: 

<fP>Q, s)_MI. s - 2* 1JW cy) + Z K :.„, ej 

This can be understood as follows: The value (p^Xl, s) is the 
cost associated with reporter / + 1 using option k e {0, 1 , . . . , R] 
and then allocating reporters j 1 , . . . , /} according to the optimal 
solution A(l, s - £* =1 cy) that corresponds to a smaller 
budget. Note that option k e {1, ... ,R] corresponds to reporter 
/ + 1 using a particular format (so that yz+i,* = for option k 
and yi + ij n = 1 for all m + k), and option k — corresponds 
to reporter I + 1 remaining idle (so that yi+\ M = 1 for all m). 
The time complexity of this dynamic programming algorithm, 
called MinCost-DP, is 0(yVfl|1V|). 

Similarly, MaxCred can be solved using dynamic program- 
ming, yielding an algorithm we label MaxCred-DP: 

D(l + 1, s) = max[D(/, s),/j (1) (Z, s),p (2) (l, s), . . .,p (R \l, s)], 
p (k \l, s)=D(l, s-e k ) + c Uk for k e {1, . . . , R} (8) 

with complexity 0(NR\B\). &={0, tj,2tj,..., B], where r\ is a 
discretization unit. 

2) Min-Cost Flow: For a fixed number of formats, it is 
possible to define strongly-polynomial, but not necessarily ef- 
ficient, optimal algorithms for MinCost and MaxCred. These 
solutions are based on minimum-cost flow algorithms J8). 
Define aj to be the number of reporters reporting with 
format fj. Define a report vector to be (a\, ai, q-r) and an 
(ori, a2, (^-assignment to be an assignment of formats to 
reporters with aj reporters reporting with format fj for each 
j e \\,..,R). We shall find an (a\, ct2, (^-assignment of 
formats to reporters of maximum credibility. 

We shall do so by transforming this problem to a min- 
cost flow problem and applying a min-cost flow algorithm to 
obtain the assignment of maximum credibility, in the following 
manner. Assign nodes for each of the reporters and each of 
the R formats. Form a complete bipartite graph between the 
reporters and formats. Assign the edge connecting reporter i 
to format fj with max capacity 1, min capacity 0, and cost 

maxfeji at) max/gji^ . R) cy - cy. Note that minimizing the set 

of such costs maximizes the set of {cy}. Also, create a source 
node and a sink node. Connect the source node to each of 
the reporter nodes and give each edge max capacity 1, min 
capacity 0, and cost 0. Connect the sink node to each of the 
format nodes and give the edge connecting to format fj max 
capacity aj, min capacity aj, and cost 0. We shall call this 
network the credibility network. This network has 0(N + R) 
vertices and O(NR) edges. 

This construction ensures that applying a min-cost flow 
algorithm to the credibility network gives a minimum cost 
and maximum credibility (a\, o^-assignment of formats to 
reporters. Using this construction, it is fairly easy to define 
an optimal algorithm for MaxCred, that we call MaxCred- 
MCF. In this algorithm, we simply enumerate all possible 
(a\, aft)-assignments, and find the highest total credibility 
assignment that satisfies the cost budget B. In a similar way, 
one can define MinCost-MCF. 

There are 0(N R ~ l ) possible report vectors. The Enhanced 
Capacity Scaling algorithm [8) solves the minimum cost flow 
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problem in time 0(\E\ log |V|(|£| + |V| log |V|)). Thus, the En- 
hanced Capacity Scaling algorithm runs in time 0(NRlog(N + 
R)(NR + ((N + R)log(N + R)))) on the credibility network. This 
leads to the following lemma: 

Lemma 3.2: Both MinCost-MCF and MaxCred-MCF run 
in time 0(N R Rlog(N + R)(NR + ((TV + R) log(N + R)))). 

Note that when the number of formats R is fixed, these 
algorithms are polynomial in N. In addition, when \&\, \W\ = 
a>(N R - 1 log(iV + R)(NR + ((N + R)log(N + R)))) these algo- 
rithms have lower asymptotic complexity than their dynamic 
programming equivalents. 

C. Leveraging the Structure of the Credibility Function 

The solutions discussed so far do not leverage any structure 
in the problem. Given an event and reporter locations, the 
credibility associated with each report format is computed as 
a number and acts as an input to the algorithms discussed. 
However, there are two interesting structural properties in the 
problem formulation. First, for a given reporter at a given 
location, the credibility is higher for a format whose cost is 
also higher. Second, for reporters at different distances, the 
credibility decays as a function of distance. In this section, 
we ask the question: can we leverage this structure to devise 
efficient approximation algorithms, or optimal special-case 
solutions either for MaxCred or MinCost? 

1 ) An Efficient Optimal Greedy MaxCred Algorithm for 
Two Formats: When a social swarming application only uses 
two report formats (say, text and video), it is possible to devise 
an optimal greedy MaxCred algorithm. Assume each of the 
N reporters can report with one of two formats, f\ or fi, that 
reporters are indexed so that reporter i is closer to the event 
than reporter k, for i < k, and that credibility decays with 
distance. Furthermore, we assume that e\ = J3 > 1 and e% = \ 
and that c, j > c,2 Vi e {!,-■■, N}: that the more expensive 
format yields a higher credibility. 

With these assumptions, the following algorithm, denoted 
MaxCred-2F, finds an assignment with maximum credibility 
that falls within a budget B and runs in time 0{N 2 ). 



Algorithm 1 Algorithm MaxCred-2F 
INPUT: (c,j): i e {l,..,N\,j e {1,2}; (I,j3); Budget B 
Define d m =c m ,i - c„,_2 for each m E {1, . . . ,N}. 
For is {0,...^min[LB/yej,A']}, do: 

1) Define Y= min[N - i, IB - Bi}]. 

2) Define the active set JH={1, Y\, being the set of i + Y 
reporters closest to the event. 

3) Define Tf as the set of i reporters in 3{ with the largest d,„ 
values (breaking ties arbitrarily). Then choose format fi for all 
reporters me IT, choose fi for m e M — T)*, and choose "idle" 
for all m £ J\. 

4) Define C' MAX as the total credibility of this assignment: 
OUTPUT: (*= arg max, C MAX . 



The output of this algorithm is the maximum credibility 
assignment of formats to reporters. We can prove that this 
algorithm is optimal. 



Theorem 1: The above algorithm finds the solution Cmax 
to MaxCred-2F problem with budget B. 

Proof: For each i, we first seek to find C' MAX , the 
maximum credibility subject to having exactly ; reporters use 
the expensive format fi . Using a simple interchange argument 
together with the fact that credibility of each format is non- 
negative and non-increasing in distance, we can show that 
there exists an optimal solution that activates the set J[ that 
consists of i + Y reporters closest to the event. Indeed, if an 
optimal solution does not use the set 3K, we can swap an idle 
reporter closer to the event with an active reporter further from 
the event, without affecting cost or decreasing credibility. 

For each subset T) c that contains i reporters, define 
C{T>) as the credibility of the assignment that assigns reporters 
m e T) the format fi , assigns the remaining reporters in J\ the 
format fi, and keeps all reporters m t J\ idle: 

C{T>) — Yjmem c tn,2 + ZmeS^m 

Then C(D) is maximized by the subset Tf containing the i 
reporters in 3\ with the largest d,„ values. This defines C' MAX , 
and Cmax is found by maximizing over all possible i. ■ 

We can analogously define a MinCost version for two 
formats, but omit it for brevity. Currently, we have not been 
able to extend this type of algorithm to 3 formats and beyond, 
so this remains an interesting open problem. 

2) A Computationally-Efficient Approximation Algorithm: 
The structure of our credibility function can also be used to 
reduce computational complexity. To understand this, recall 
that the dynamic programming algorithms described above 
jointly optimized both reporter selection and format selection. 
In this section, we describe an approximation algorithm for 
MinCost, called MinCost-CC, where the structure of the 
credibility function is used to determine, for each reporter, 
the format that the reporter should use. As we shall show, 
MinCost-CC has significantly lower run-times at the expense 
of slight non-optimality in its results. 

MinCost-CC is based on the following intuition. Close 
to the location of the event, even low-cost formats have 
reasonable credibility. However, beyond a certain distance, the 
credibility of low-cost formats like text degrades significantly, 
to the point where even the small cost of that format may not 
be justified. Put another way, it is beneficial for a reporter 
to use that format whose credibility per unit cost (hence 
MinCost-CC) is highest — this gives the most "bang for 
the buck". Thus, for a given reporter, its current distance 
d from the location of the event may pre-determine the 
format it uses. Of course, this pre-determination can result 
in a non-optimal choice, which is why MinCost-CC is an 
approximation algorithm. 

Formally, in MinCost-CC, if, for a reporter z: 

\c it k(Si,L) 
k = arg max k 

L e k 

then reporter i chooses format fi*. This choice can be pre- 
computed (since it depends only upon the credibility and cost 
models), but each reporter needs to recalculate its choice of the 
report format whenever its relative distance to the concerning 
event changes. The event locations that determine the format 
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/j. chosen by a particular reporter i form annular regions about 
the reporter. 

Once each reporter has made the format choice, it remains 
for the director to decide which reporter(s) to select. For 
MinCost-CC, the minimum cost formulation is identical to 
©, and with comparable complexity, but with two crucial 
differences: both the constant \ r W\ and the runtime now relate 
only to the number N of reporters, not to N xR. As we shall 
show below, this makes a significant practical difference in 
runtime, even for moderate-sized inputs. 

In MinCost-CC, the dynamic programming process of (Q 
is replaced by 

A(l+l,s) = max [A(l, s), c,+A(l,s- e,)} (9) 

where c; replaces cij in since each reporter precomputes its 
format of choice. Compared with (O, the time complexity of 
© is reduced to 0(N\ f W\) with a much smaller \W\ in general. 
Notice that this time complexity is independent of R, the 
number of report formats, greatly improving its computational 
efficiency at the expense of some optimality. 

Using steps similar to that presented in Section IIII-BI it 
is possible to define a MaxCred-CC approximation algorithm 
for maximizing credibility. We omit the details for brevity, but 
indicate that MinCost-CC and MaxCred-CC still have weakly- 
polynomial asymptotic complexity, but are computationally 
much more efficient than MinCost-DP and MaxCred-DP. 

Evaluation of MinCost-CC. The approximation algorithms 
discussed above trade-off optimality for reduced computational 
complexity. As such, it is important to quantify this trade-off 
for practical swarm configurations. In this section, we compare 
MinCost-CC with MinCost-DP 2 : lack of space prevents us 
from a more extensive evaluation, but we expect our conclu- 
sions to hold in general. 

Lacking data from social swarming applications, we use 
two different data sets. First, we carefully 3 manually mine 
Google News for interesting events. Searching for a specific 
set of keywords describing an event in Google News retrieves 
a list of news items related to that event within 24 hours 
of occurrence of that event. The event location is explicitly 
specified in the news items. Each news item has a location, 
which is assumed to be the location of a reporter. We use 
the event location and report location as inputs to MinCost- 
CC and MinCost-DP. In this paper, we present results from 
three events: an event of regional scope, a basketball playoff 
game between the Lakers and the Jazz (31 reporters); an event 
of national scope, the passage of the healthcare reform bill 
(63 reporters); and an event of global scope, the opening 
of the Shanghai exposition (88 reporters). Of course, this 
choice of a surrogate for social swarming is far from perfect. 
However, this data set gives a varied, realistic reporter location 
distribution; since our algorithms depend heavily on location, 
we can draw some reasonable conclusions about their relative 
performance. That said, we also use a dataset generated from 

2 In some of our evaluations, we use R = 4. In this regime, MinCost-DP is 
more efficient than MinCost-MCF, hence the choice. 

3 We re-scaled reporter distances and did several data cleaning operations: 
removing blog posts, handling duplicate reports etc. We omit a discussion of 
these for brevity. 




(a) Optimality gap (b) Runtime (in microseconds) 



Fig. 1 . Minimal cost of 4 formats with increasing k 

a random distribution of reporters to ensure that we are not 
misled by the Google News data set, but also to explore the 
impact of larger swarm sizes. 

We are interested in two metrics: the optimality gap, which 
is the ratio of the min-cost obtained by MinCost-CC to that 
obtained by MinCost-DP; and the runtime of the computation 
for each of these algorithms. 

Figure Q] plots these two metrics as a function of the 
credibility threshold, expressed as a number k. A value k 
represents a credibility threshold corresponding to the total 
credibility of k reports of the highest cost format from a 
distance ho (e.g., if k is 3 and the highest cost format is 
video, then the director is interested in obtaining credibility 
equivalent to that from three video reports). In this graph, we 
use four data formats with ho, yi-4, £1-4 and the corresponding 
ei_ 4 setting to 1, (1, 1, 1, 1), (2, 1.5, 1, 0.5) and (1, 2.2, 5.4, 
13.7) respectively. We have evaluated different numbers of data 
formats and different parameter settings and have obtained 
qualitatively similar results, but omit a discussion of these for 
lack of space. 

From Figure [T(a)} the optimality gap is, on average 19.7%, 
across different values of k. This is encouraging, since it 
suggests that MinCost-CC produces results that are not signif- 
icantly far from the optimal. Interestingly, no optimal solution 
exists for k > 5 for the regional event: this credibility threshold 
experiences a "saturation", since there does not exist a set 
of reporters who can collectively satisfy that threshold. Other 
events saturate at different values of k. Finally, while this is 
not apparent from these graphs, the minimum cost solution is 
approximately linear in k for MinCost-CC and MinCost-DP. 

More interestingly, from Figure |l(b)| it is clear that the 
runtime of MinCost-CC is 2-3 orders of magnitude lower 
than that of MinCost-DP with the discretization setting [W\ = 
10CXW. This difference is not just a matter of degree, but 
may make the difference between a useful application and 
one that is not useful: MinCost-DP can take several tens 
of seconds to complete while MinCost-CC takes at most a 
few hundred milliseconds, which might make the difference 
between victory and defeat in a balloon hunt, or life and 
death in a disaster response swarm! The explanation for the 
performance difference is the lower asymptotic complexity 
of MinCost-CC. A subtle finding is that the running time 
of both MinCost-CC and MinCost-DP decreases, sometimes 
dramatically in the case of MinCost-CC, with increasing k. 
Intuitively, this is because there are fewer candidate sets of 
reporters who can satisfy a higher credibility, resulting in a 
smaller search space. 
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(a) Optimality gap 



(b) Runtime (in microseconds) 



Fig. 2. Minimal cost in random topologies with increasing k (error bars are 
very small thus ignored in (b)) 



For random topologies, Figure [2] plots the optimality gap 
and runtime, averaged over 50 simulations. MinCost-CC is, 
on average 20.5% and 17.4% for 100 and 200 reporters, off 
the optimal for different values of k, but is still 2-3 orders of 
magnitude more efficient than MinCost-DP. The runtimes for 
both algorithms are slightly higher, given the larger number of 
reporters. Moreover, with 100 or 200 reporters, the optimality 
gap has the same upper bound, about 35% for large k. This 
is also observed in other simulations for report numbers of 
50, 150 and 300 (not shown). We have left an analytical 
exploration of this upper bound to future work. Finally, a 
comparison of these results with Figure |l(a)| reveals an inter- 
esting result. Although different types of reporter deployments 
can result in different optimality gap curves (the curves for 
the three different types of Google News in Figure |l(a)| are 
not the same), the national event seems a qualitatively similar 
optimality gap curve as the random topologies, suggesting that 
its deployment is similar to that event. Understanding this in 
greater depth is also left to future work. 

D. Extensions 

Incorporating sources of noise into our algorithms is 
straightforward, so we will mention this briefly. Recall that the 
way we model a noise source increases a reporter's effective 
distance. Since our optimal algorithms, like MinCost-DP or 
MinCost-MCF, are agnostic to the structure of the credibility 
function, they are unaffected by noise. For an algorithm like 
MinCost-CC, which does take structure into account, recall 
that noise sources increase a reporter's effective distance. Since 
reporters can quantify ambient noise, they can each use the 
effective distance to calculate the report format to use. 

Finally, our algorithms can, in general, deal with mono- 
tonically increasing corroboration functions where the total 
credibility of a collection of reporters may be a non-linear 
function of the individual credibilities. If /(.) were to represent 
a monotonically increasing credibility function, we only need 
use 1(c) to replace c in our dynamic programming formulation. 
For example, (0 would become 

A(Z + 1, s) = max {A(l, s), c, + A(l, I(s - e,))} 

Similar changes can be applied to other dynamic programming 
formulations. 

IV. The Renewals Problem: Randomly Arriving Events 

In the previous section, we discussed a one-shot problem: 
that of optimizing a single event. We now consider a sequence 



of events that arrive at times {t\,t2,h, . . where is a 
real number that represents the arrival time of event k. We 
assume that tk < £& + i for all k. In this setting, we consider a 
stochastic variant of MaxCred, called MaxCred-Stochastic: 
Instead of maximizing credibility for a single event subject to a 
cost constraint, we maximize the average credibility-per-event 
subject to an average cost constraint and a per-event credibility 
minimum. This couples the decisions needed for each event. 
However, we first show that this time average problem can be 
solved by a reduction to individual knapsack problems of the 
type described in previous sections. We then show that if the 
per-event credibility minimum is removed, then decisions can 
be made in a decentralized fashion. Specifically, after the pro- 
cessing of every event, the swarm director passes a weight to 
all reporters. The reporters then make uncoordinated decisions 
when processing the next event, without any intervention from 
the swarm director. We derive a similar distributed version for 
stochastic MinCost, labelled MinCost-Stochastic. 

We start by solving the general time average prob- 
lem using Lyapunov optimization 0, which can handle 
MaxCred-Stochastic, MinCost-Stochastic, as well as varia- 
tions with more general constraints. 

A. The General Stochastic Problem 

Let u)[k] represent a random vector of parameters associated 
with each event k, such as the location of the event and the 
corresponding costs and credibilities. While u>[k] can include 
different parameters for different types of problems, we shall 
soon use u>[k]=[(cj j[k]),(ej[k])], where (Cjj[kJ) is the matrix 
of event-£ credibility values for reporters i e [1,,,.,N] and 
formats fj e and (e j[k]) is a vector of cost 

information. We assume the process a>[k] is ergodic with a 
well defined steady-state distribution. The simplest example 
is when a>[k] is independent and identically distributed (i.i.d.) 
over events k e {1, 2, 3, . . .}. 

Let frame k denote the period of time [tk, fjt+i) which starts 
with the arrival of event k and ends just before the next event. 
For every frame k, the director observes a>[k] and chooses 
a control action a[k] from a general set of feasible actions 
^ca[k] that possibly depend on u>[k]. The values a>[k] and 
a[k] together determine an M + 1 dimensional vector y[k], 
representing network attributes for event k: 

ym = (yo[klym,...,y M [kJ) 

Specifically, each y„,[k] attribute is given by a general function 
of a[k] and a>[k]: 

y„,[k] = y„,(a[k],cj[k]) Vm 6 {0, 1, ... , M) 

The functions y m (a[k],u>[k]) are arbitrary (possibly non-linear, 
non-convex, discontinuous), and are only assumed to be 
bounded. 

Define y m as the time average expectation of the attribute 
y m [k], averaged over all frames (assuming temporarily that the 
limit exists): 



y, 



,4 ]im^.£E£iE {?»[*]} 
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The general problem is to find an algorithm for choosing 
control actions a[k] for each frame k e {1,2, 3, . . .} to solve: 

Minimize: y (10) 

Subject to: 1) y m < Vm e {1,2, . . .,M) (11) 
2) a[k] e ^ w V frames k e {1,2,. . .}(12) 

The solution to the general problem is given in terms of 
a positive parameter V, which affects a peformance tradeoff. 
Specifically, for each of the M time average inequality con- 
straints y m < 0, define a virtual queue Z m [k] with Z,„[0] = 0, 
and with frame-update equation: 



Zfn [k+l] = mzx[Z m Vc]+y m [klO] 



(13) 



Then every frame k, observe the value of u>[k] and perform 
the following actions: 

• Choose a[k] e J?^] to minimize: 

Vy (a[k], o)[k]) + Z£i Z m [k]y m (a[k],u>[k]) 

• Update the virtual queues Z m [k] according to (fT3b . using 
the values y m [k] = y m (a[k],a>[k]) determined from the 
above minimization. 

Assuming the problem is feasible (so that it is possible to 
meet the time average inequality constraints), this algorithm 
will also meet all of these constraints, and will achieve a 
time average value y that is within 0(1 /V) of the optimum. 
Typically, the V parameter also affects the average size of the 
virtual queues (these can be shown to be 0(V), which directly 
affects the convergence time needed for the time averages 
to be close to their limiting values). The proofs of these 
claims follow the theory developed in lr9l- lfT71l . with minor 
notational adjustments needed to change the timeslot averages 
there to frame-averages here. Specifically, the work in |9j- 
IfTTI considers i.i.d. events co[k], but the same holds for more 
general ergodic events lfl2l . 



B. Corroboration Pull as a Dynamic Optimization Problem 

Here we formulate MaxCred-Stochastic. Define 
u>[k]=[(cj j[k]), (e j[k])] (representing costs and credibilities 
of event k), and define a[k]=(xij[k]), where Xjj[k] is a 
binary variable that is 1 if reporter i e {l,...,N} uses 
format fj e {/i, . . . ,/«} on frame k. The goal is to maximize 
the average credibility-per-frame subject to average cost 
constraints and to a minimum credibility level required on 
each frame k 6 {1,2,...}: 

Maximize: c (14) 

Subject to: e < e m , (15) 

S£i EjU XijWciJk] > c min Vframes * (16) 

x u [k] e {0, 1} Vi, j, Vframes k (17) 

Z*=i x UJ [k] < 1 V/', Vframes k (18) 



where e av and c min are given non-negative constants, and c and 
e are defined: 

j K N R 

k=\ i=\ j=\ 

j K N R 

k=l i=\ j=\ 

This problem fits the general stochastic optimization frame- 
work of the previous subsection by defining yo{t),y\{t) by: 

y (t) = y (a(t),aj(t))± - £f =1 , x u [k]c u [k] 
yi (t) = Si(a(t)Mt))= ~ e m + 2f =1 Z%i Xijk]ej[k] 

and by defining the set Jl^k] as the set of all (xy[fc]) matrices 
that satisfy the constraints (fT6ll-(fT8l. The resulting stochastic 
algorithm thus defines a virtual queue Z\{f) with update: 



Z\[k+ 1 ] = max 



Z 1 [k]-e m + Y J J]x u [k]ej[klO 



''=1 ;=i 



and then observing the a>[k] parameters every frame k and 
choosing (xij[k]) to solve: 

Minimize: ^f=i Z%i HjWi [*]«;[*] - Vcy [Jfc]] (19) 
Subject to: S*=i v,./|A |r,. ; |A | > c m!n (20) 

XijW 6(0,1} ViJ,Z^,JtyW < 1 (21) 

MaxCred-Stochastic problem has the exact same structure as 
the one-shot MinCost problem described in previous sections, 
with the exception that the cost weights are changed from 
ej[k] to Zi[k]ej[k] - Vcjj[k], which can possibly be negative. 
However, the same knapsack technique can be used to solve it 
(possibly by shifting the function to be minimized by a positive 
constant to make the resulting terms non-negative). Further, we 
note that the performance of the stochastic algorithm degrades 
gracefully when approximate implementations are used, such 
as implementations that are off from the optimal knapsack 
problem by a multiplicative constant O . Thus, the simple 
MinCost-CC heuristic can be used here. 

A simple and exact distributed implementation arises if the 
c m i„ constraint ( TToT ) is removed (i.e., if c ml „=0). In this case 
the frame k decisions (fl~9b- (f2Tl> are separable over reporters 
and reduce to having each reporter i e {1, . . . , N] solve: 

Minimize: £>i x u [k] \Z X [k]e j[k] - Vc u [k]] 
Subject to: jcy[*] 6 {0, 1 } V; , , x u [k] < 1 

That is, each reporter i chooses the single format fj e 
{/i,...,/r} with the smallest value of Zi[k]ej[k] - Vcij[k], 
breaking ties arbitrarily and choosing to be idle (with Xjj[k] = 
for all j e { 1 , ... , R}) if all of the weights Z x [k]ej[k] - Vc u [k] 
are positive. The swarm director observes the outcomes of the 
decisions on frame k and iterates the Z\ [k] update, passing the 
weight Z\[k+ 1] to all reporters before the next event occurs. 
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C. MinCost-Stochastic 

MinCost-Stochastic can be formulated as follows: 

Minimize: e 

Subject to: c > c av , and constraints (fTTT i. ([T8| > 

We can thus define yo(i) and yi(f) as: 

yoM=Z^iE^i*yM«#] 

from which a similar distributed solution can be obtained. 

V. Related Work 

We are not aware of any prior work in the wireless 
networking literature that has tackled information credibility 
assessment. 

However, other fields have actively explored credibility, de- 
fined as the belie vability of sources or information |Q~3), lfT31 . 
(16). Credibility has been investigated in a number of fields 
including information science, human communication, human- 
computer interaction (HCI), marketing, psychology and so on 
lfl7l . In general, research has focused on two threads: the 
factors that affect credibility, and the dynamics of information 
credibility. 

The seminal work of Hovland et al. JT4] may be the earliest 
attempt on exploring credibility, which discusses how the 
various characteristics of a source can affect a recipient's 
acceptance of a message, in the context of human communica- 
tion. Rieh, Hilligoss and other explore important dimensions of 
credibility in the context of social interactions U_3), IfTTl . |[T8l . 
such as trustworthiness, expertise and information validity. 
McKnight and Kacmar |[T3l study a unifying framework of 
credibility assessment in which three distinct levels of credi- 
bility are discussed: construct, heuristics, and interaction. Their 
work is in the context of assessing the credibility of websites 
as sources of information. 

Wright and Laskey fl9l discuss how to tackle fusion of 
credible information. They present a weighting based, prob- 
abilistic model to compute uncertain information credibility 
from diverse sources. Several techniques are combined with 
this model, like prior information, evidence when available 
and opportunities for learning from data. 

Sometimes, the terms credibility and trust are used synony- 
mously. However, they are distinct notions: while trust refers 
to beliefs and behaviors associated with the acceptance of 
risk, credibility refers to the believability of a source, and a 
believable source may or may not result in associated trusting 
behaviors IfTTl . 

Finally, there is a body of work that has examined processes 
and propagation of credible information. Corroboration as a 
process of credibility assessment is discussed in 11201 . Prox- 
imity, both geographic and social, and its role in credibility 
assessment is discussed in Q: our role of geographic distance 
as a measure of credibility is related to this discussion. 
Saavedra et al. [21] explore the dynamics and the emergence 
of synchronicity in decision-making when traders use corrob- 
oration as a mechanism for trading decisions. 



VI. Conclusions and Future Work 

In this paper, we have explored the design space of algo- 
rithms for a new problem, optimizing pull corroboration in an 
emerging application area, social swarming. We have proposed 
optimal special-case algorithms, computationally efficient ap- 
proximations, and decentralized optimal stochastic variants. 
However, our work is merely an initial foray into a broad 
and unexplored space, with several directions for future work: 
increasing credibility and cost model realism, incorporating 
malice, allowing peers to relay reports, and exploring other 
realistic, yet efficient and near-optimal special-case solutions. 
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